Add Zarr v2 archive format support by egparedes · Pull Request #286 · GridTools/serialbox

egparedes · 2026-02-18T22:12:16Z

Summary

This PR adds support for the Zarr v2 storage format as a new archive backend in Serialbox. Zarr is a cloud-friendly, chunked array storage format that enables efficient I/O and interoperability with scientific Python tools.

Key Changes

New ZarrArchive class (src/serialbox/core/archive/ZarrArchive.h/cpp):
- Implements the Archive interface for Zarr v2 format
- Stores each field as a Zarr array in a subdirectory named <prefix>_<field>.zarr/
- Supports multiple saves of the same field with the first dimension representing the save index
- Handles data serialization/deserialization with proper byte-order handling
- Includes metadata management via JSON files (.zarray for Zarr metadata, ArchiveMetaData-<prefix>.json for Serialbox metadata)
Core Features:
- Supports multiple data types: Boolean, Int32, Int64, Float32, Float64
- Handles non-contiguous storage views through column-major iteration
- Implements endianness detection for proper dtype specification
- Provides both archive-based I/O (with save IDs) and direct file-based I/O (writeToFile/readFromFile)
- Proper error handling and validation of archive metadata
Integration:
- Updated ArchiveFactory to recognize and create Zarr archives
- Added comprehensive unit tests covering construction, metadata handling, read/write operations, and various data types/dimensions
- Updated CMake build configuration to include new source files

Directory Layout:

<directory>/
  ArchiveMetaData-<prefix>.json
  <prefix>_<field>.zarr/
    .zarray                    (Zarr metadata)
    0.0.0...0                  (chunk for save 0)
    1.0.0...0                  (chunk for save 1)

Implementation Details

Data is stored without compression in native byte order for simplicity and performance
Chunk naming follows Zarr v2 specification with indices separated by dots
The implementation properly handles arrays with varying dimensions (2D to 7D tested)
Supports both row-major and column-major storage layouts through the StorageView abstraction
Thread-safety is not currently supported (marked as false in the implementation)

https://claude.ai/code/session_012z6neCsMd8cRDKcYFRaqrz

Implements a native Zarr v2 archive (ZarrArchive) that requires no external library beyond standard C++17. Each field is stored as a Zarr array in a subdirectory named `<prefix>_<field>.zarr/` under the archive directory. Multiple saves of the same field are tracked via a leading time dimension; individual saves map to separate chunk files (`<id>.0.0...0`) following the Zarr v2 chunk naming convention. Changes: - src/serialbox/core/archive/ZarrArchive.h: new archive class - src/serialbox/core/archive/ZarrArchive.cpp: full implementation * pure C++17, no external dependencies * native byte-order dtype strings in .zarray metadata * handles both contiguous and strided (padded) StorageViews * supports Read / Write / Append open modes * writeToFile / readFromFile for stateless single-save I/O - src/serialbox/core/archive/ArchiveFactory.cpp: register Zarr; add .zarr extension mapping in archiveFromExtension - src/serialbox/core/archive/ArchiveFactory.h: update docstring - src/serialbox/core/CMakeLists.txt: compile ZarrArchive sources - test/serialbox/core/archive/UnittestZarrArchive.cpp: unit tests mirroring the NetCDF test suite (construction, metadata validation, .zarray content, writeToFile/readFromFile, typed read/write round-trips) - test/serialbox/core/CMakeLists.txt: include new test file https://claude.ai/code/session_012z6neCsMd8cRDKcYFRaqrz

Copilot

Pull request overview

This PR adds support for the Zarr v2 storage format to Serialbox as a new archive backend. Zarr is a cloud-friendly, chunked array storage format that enables efficient I/O and interoperability with scientific Python tools. The implementation follows established patterns from existing archive backends (BinaryArchive and NetCDFArchive) and integrates seamlessly with the existing ArchiveFactory infrastructure.

Changes:

Implements a new ZarrArchive class that stores fields as Zarr v2 arrays in subdirectories, with support for multiple saves per field
Integrates Zarr archive into ArchiveFactory for archive creation and file extension resolution
Adds comprehensive unit tests covering construction, metadata handling, read/write operations, and various data types/dimensions

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
src/serialbox/core/archive/ZarrArchive.h	Header defining the ZarrArchive class with Archive interface implementation, static utility methods, and helper functions
src/serialbox/core/archive/ZarrArchive.cpp	Implementation of ZarrArchive including endianness detection, data serialization, metadata management, and Zarr v2 format compliance
src/serialbox/core/archive/ArchiveFactory.h	Updated documentation to include .zarr extension mapping
src/serialbox/core/archive/ArchiveFactory.cpp	Integrated ZarrArchive into factory methods for creation and file I/O
src/serialbox/core/CMakeLists.txt	Added ZarrArchive source files to build configuration
test/serialbox/core/CMakeLists.txt	Added ZarrArchive unit test to test suite
test/serialbox/core/archive/UnittestZarrArchive.cpp	Comprehensive tests for ZarrArchive covering construction, metadata validation, read/write operations, and multiple data types/dimensions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-18T22:20:56Z

src/serialbox/core/archive/ZarrArchive.cpp

+  const std::size_t numDataDims = activeDims.size();
+
+  // Create directory
+  std::filesystem::create_directories(fieldDir);


The create_directories call should be wrapped in a try-catch block to handle std::filesystem::filesystem_error exceptions consistently with the constructor (lines 187-189) and the write method (lines 290-294). This ensures that filesystem errors are properly caught and converted to Serialbox Exception types with appropriate error messages.

egparedes and others added 2 commits February 18, 2026 18:08

Add test data zip archive

052da05

Copilot AI review requested due to automatic review settings February 18, 2026 22:12

egparedes closed this Feb 18, 2026

Copilot started reviewing on behalf of egparedes February 18, 2026 22:12 View session

Copilot AI reviewed Feb 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Zarr v2 archive format support#286

Add Zarr v2 archive format support#286
egparedes wants to merge 2 commits intoGridTools:masterfrom
EGPlace:claude/serialbox-netcdf-zarr-design-X584N

egparedes commented Feb 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

egparedes commented Feb 18, 2026

Summary

Key Changes

Implementation Details

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments